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Abstract — This work studies tlie recursive robust principal 
components' analysis (PCA) problem. Here, "robust" refers to 
robustness to both independent and correlated sparse outliers, 
although we focus on the latter. A key application where this 
problem occurs is in video surveillance where the goal is to 
separate a slowly changing background from moving foreground 
objects on-the-fly. The background sequence is well modeled as 
lying in a low dimensional subspace, that can gradually change 
over time, while the moving foreground objects constitute the 
correlated sparse outliers. In this and many other applications, 
the foreground is an outlier for PCA but is actually the "signal 
of interest" for the application; where as the background is the 
corruption or noise. Thus our problem can also be interpreted 
as one of recursively recovering a time sequence of sparse signals 
in the presence of large but spatially correlated noise. 

This work has two key contributions. First, we provide a new 
way of looking at this problem and show how a key part of 
our solution strategy involves solving a noisy compressive sensing 
(CS) problem. Second, we show how we can utilize the correlation 
of the outliers to our advantage in order to even deal with very 
large support sized outliers. The main idea is as follows. The 
correlation model applied to the previous support estimate helps 
predict the current support. This prediction serves as "partial 
support knowledge" for solving the modified-CS problem instead 
of CS. The support estimate of the modified-CS reconstruction is, 
in turn, used to update the correlation model parameters using 
a Kalman filter (or any adaptive filter). We call the resulting 
approach "support-predicted modified-CS". 

I. Introduction 

Most high dimensional data often approximately lies in a 
lower dimensional subspace. Principal Components' Analysis 
(PCA) is a widely used dimension reduction technique that 
finds a small number of orthogonal basis vectors (principal 
components), along which most of the variability of the dataset 
lies. To be precise, for a given dimension, r, PCA finds the r- 
dimensional subspace that minimizes the mean squared error 
between data vectors and their r-dimensional projections |3]. 
The subspace spanned by the principal components (PCs) is 
called the principal components' space (PC space). Often, for 
time series data, the PC space changes gradually over time. 
Updating the PC space recursively as more data comes in, 
without re-solving the entire PCA problem, is referred to as 
recursive PCA H). 

Notice that to find an r dimensional PC space, one needs 
at least r data vectors, usually more. Thus, even for recursive 
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PCA, the initial step needs to be a batch one or the initial PC 
space needs to be pre-specified. 

It is well known that PCA is very sensitive to outliers. 
Computing the PCs in the presence of outliers is called robust 
PCA ISl, 0, 13, iSl. Doing recursive PCA in the presence of 
outliers is referred to as recursive robust PCA dU, ||9l, ifTol . 
"Outlier" is a loosely defined term that usually refers to any 
corruption that is not small compared to the true data vector 
and that occurs only occasionally. As suggested in ifTTl . Q, 
an outlier can be very nicely modeled as a sparse vector, i.e. 
a vector whose most elements are zero, while the few that are 
nonzero can have any magnitude. We will use this definition in 
this paper In ifTTl . Q, the outlier is modeled as being spatially 
and temporally independent. In most real applications, the 
time at which the outliers begin to occur is often random and 
independent of all past times. But once outliers begin to occur, 
for some time after that they are both spatially and temporally 
correlated. In this work, we focus on this case, i.e. on recursive 
robust PCA that is robust to correlated sparse outliers. 

A key application where the robust PCA problem occurs 
is in video surveillance where the goal is to separate a 
slowly changing background from moving foreground objects 
iU, Q. If we stack each video frame as a column vector, 
then the background is well modeled as lying in a low 
dimensional subspace that may gradually change over time, 
while the moving foreground objects constitute the sparse 
outliers IfTTl ||7| which change in a correlated fashion over time. 
We will use this as the motivating application in this work. 
Other important applications include sensor networks based 
detection and tracking of abnormal events such as forest fires 
or oil spills; or online detection of brain activation patterns 
from fMRI sequences (the "active" part of the brain can be 
interpreted as a correlated sparse outlier). Clearly, in all these 
cases, one would need a real-time and fast solution and hence 
a recursive robust PCA solution is desirable. 

The moving objects or the brain active regions or the oil spill 
region may be "outliers" for the PCA problem, but in most 
cases, these are actually the "signals of interest" whereas the 
background image is the noise. Also, all the above "signals of 
interest" are sparse vectors that change in a correlated fashion 
over time. Thus, this problem can also be re-interpreted as one 
of recursively recovering a time sequence of correlated sparse 
signals, St, in the presence of large but "low rank" noise, Lt. 

Definition 1: The term "low rank" vector means that the 
n X T matrix L := [Lt-r+i, ■ ■ ■ Lt] has low-rank, i.e. its rank 
is less than min(n, r) for r large enough. This would happen 



2 



if Lt is correlated enough to have a low rank covariance matrix 
and this matrix changes slowly over time. We make this precise 
below in Sec. II-AI 

A. Problem Definition 

Our problem can be defined as follows. The measurement 
vector at time, t, Alt, is an n x 1 vector that satisfies 

Mt = Lt + St (1) 

where St is a sparse vector, with support set denoted by Tj, 
and Lt is a dense (non sparse) but "low rank" vector. The 
support set of St, Tt, can be correlated over time and space. 

Suppose, we have a good estimate of the initial PC matrix. 
Pi) ~ Po- For t > 0, our goal is to recursively keep estimating 
the sparse part, St, at each time, and to keep updating Pt every- 
so-often, by collecting the recent estimates of Lt = Mt — St- 

To make things precise, we assume that Lt satisfies 

Lt^Uxt 

where U is an unknown orthonormal matrix and xt is an n x 1 
sparse vector whose support changes every-so-often and whose 
elements are spatially uncorrected. Let Nt denote the support 
set of xt- We assume that Nt is piecewise constant with time. 
Thus, the columns of the sub-matrix, Pt :~ {U)Nt, span the 
low dimensional subspace in which the current set of Lt's 
lie and Lt = Pt{xt)Nf We refer to Pt as the principal com- 
ponents' (PC) matrix. Clearly, this is also piecewise constant 
with time. 

Every d time units, there are k additions to the set Nt, 
or, equivalently, k directions get added to Pt. When a new 
direction gets added, the magnitude of Xt along it is initially 
small but gradually increases to a larger stable value. Also, the 
values of Xt along k existing directions gradually decays down 
to zero, i.e. the corresponding directions get slowly removed 
from Pt. We provide a generative model that satisfies these 
assumptions in the Appendix. It models xt (and hence Lt) as 
being piecewise stationary with short nonstationary transients 
between the pieces that occur whenever the support of xt 
changes. 

In the video problem, the observed image is an overlay 
of the foreground and the background image. Denote the 
image, background image and foreground image written as 
a ID vector by Mt, Lt and Ot respectively. Let Tt denote 
the support of Ot and let denote the complement set 
of Tt. By overlay, we mean that {Mt)Tt = {Ot)Tt and 
{Mt)T^ = {Lt)T^. We can rewrite this in the form of ^ 
by defining St as 

If we refer to the above problem as recursive robust PCA, 
then the low dimensional vector, Lt, is the "signal of inter- 
est" while the correlated sparse vector, St, is the corruption 
(outlier). On the other hand, if we refer to it as recursive 
sparse recovery in large but spatially correlated (low rank) 
noise, Lt, then the correlated sparse vector, St, is the signal of 
interest, while Lt is the corruption (large but low rank noise). 



An example of noise Lt being much larger than signal St is 
shown in Fig. |l(c)| In the rest of the paper, we just refer to 
St as the sparse part and Lt as the low rank part. 

In this work, we focus on the Mt = St + Lt case, but our 
proposed ideas will also apply if 

Mt = ^St + Lt (3) 

where can be a fat matrix. This would be the standard sparse 
recovery problem from a reduced number of measurements 
but with the difference that the measurements are corrupted 
by very large noise. Also, this allows the signal to be sparse 
in some other basis other than the canonical basis. With the 
exception of ifTTl . which can handle another kind of very large 
but "structured" noise (the noise or outlier needs to be sparse), 
almost all other existing sparse recovery as well as recursive 
sparse recovery approaches for time sequences only work with 
small noise lEl, lEl, Ol, QSl, QS), E), lUll, lUll, ||20l 
(surely none of these will work if the noise is significantly 
larger than the sparse part). 

B. Contributions and Paper Organization 

Our first contribution is to show how our problem can be 
reformulated as a sparse recovery / compressive sensing (CS) 
plus recursive PCA problem. We call our solution Recursive 
Projected Compressive Sensing (ReProCS). Its block diagram 
is shown in Fig. |2(a)| The key idea is as follows. Assume that 
the current PC matrix Pt has been accurately estimated, i.e. we 
are given Pt « Pt. We project the measurement vector, Mt, 
into the space perpendicular to Pt to get an 7i — r dimensional 
projected measurement vector yt. Here, r is the rank of Pt. 
This projection nullifies most of the contribution of Lt. It 
is assumed that this projection does not nullify any nonzero 
component of the sparse vectors, 5*4. Recovering St from yt 
now becomes a traditional noisy CS lfT4l . ET\ . Il22l problem. 
The recovered St can be used to estimate Lt which can then be 
used to update Pt every-so-often (recursive PCA). In this work 
we misuse terminology a little and use "compressive sensing" 
or "CS" to refer to the £i minimization problem. 

Our second key contribution is to show how we can utilize 
the correlation of the sparse part, St, to our advantage to suc- 
cessfully recover St even when its support size, \Tt\, increases 
for a given rank r of Pt and as a result the number of projected 
"measurements" available for the CS step, n — r, become too 
small for CS to work. The correlated sparse outliers (e.g. 
moving foreground objects in video) can be interpreted as 
sparse signal sequences whose support change over time is 
either slow; or quite often is not slow, but is still correlated, 
e.g. the support can "move " or "expand" or change according 
to some model, over time. By using a model on how the 
objects move or deform, or other models on how the support 
changes, it is possible to obtain a support prediction that 
serves as an accurate "partial support knowledge". We then tap 
into our recent work on modified-CS (modCS) which solves 
the sparse recovery problem with much fewer measurements 
when reliable support knowledge is available ll23l . The support 
estimate of the modCS reconstruction can then be used to 
update the correlation model parameters using a Kalman filter 
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(c) The nonzero elements of St has small magnitude 5 (d) Recovery of Fig. |l(c)| 

Fig. 1: We plot Mt, St, and St at t — to + 5 for the experiment described in Sec. IVI-Al The sparse part St has large magnitude in the top 
row and and small magnitude in the bottom row. As a result, RSL works in the first case but not the second one. The support sets of 
5t's are large and generated in a correlated fashion and so the sparse matrix S = [Si, . . . StQ+200] also has low rank (rank = 45%). This is 
the reason PCP (7) does not work in both cases (also see Table U PCP does work in case of small magnitude but small support St's)- 



or any other adaptive filter We call this support predicted 
modified-CS (supp-pred-modCS). Its block diagram shown in 

Fig. pa 

This paper is organized as follows. We discuss related work 
next and define notation in Sec. The ReProCS idea is 
described in Sec. Hill The supp-pred-modCS idea and the 
overall ReProCS (modCS) approach is developed in Sec. |IV] 
We also discuss its stability over time. In Sec.[V] we explain 
the recursive PCA algorithm which is based on the idea of 
Simulation experiments and partly -real experiments (real 
background images with simulated foreground sparse image) 
evaluating the performance of ReProCS and ReProCS (modCS) 
and comparing them with other state-of-art methods - two 
recursive robust PCA methods, ifTOl and ID, two batch robust 
PCA methods, principal components' pursuit (PCP) HI and 
robust subspace learning (RSL) ||6l, and with simple thresh- 
olding based background subtraction (for the video case) are 
given in Sec. [Vll Conclusions, limitations and future work are 
given in Sec. IVIII Sections \III-B\ \IV-Q and\V\can be skipped 
on a quick reading. We mark them with a **. 

For ease of review, both our related work discussion and our 
experiments' section is very detailed. Some of this material can 
be shortened/removed eventually. 

C. Related Work 

There has been a large amount of work on robust PCA, e.g. 
Is), |l6l, and recursive robust PCA e.g. E), HOl. In most of 
these works, either the locations of the missing/corruped data 
points are assumed known ID (not a practical assumption); 
or they first detect the corrupted data points and then replace 
their values using nearby values |j9l; or weight each data point 
in proportion to its reliability (thus soft-detecting and down- 
weighting the likely outliers) ||6l, IfTOl ; or just remove the 
entire outlier vector. Approaches like ID can be adapted to 
the case where the missing/corrupted data points are unknown 
by using the outlier detection approach from other works. 



e.g. from ifTOll (we refer to the resulting method as adapted- 
ID). Detecting or soft-detecting outliers (sparse part St) as 
in ini, in, ifTOI is easy to do when their magnitude is large 
compared to that of Lt, but not when it is smaller, e.g. see 
Fig [T] and Table H] When the signal of interest is St (the case 
of recursive sparse recovery in large but low rank noise), the 
most difficult situation is when nonzero elements of St have 
small magnitude compared to those of Lt- Moreover, as we 
explain in Sec. I VII approaches such as ifTOll or adapted-Q also 
cannot work in case of too many outliers (large support size 
of St), e.g. see Table U But ReProCS is able to successfully 
recover both small magnitude and fairly large support-sized 
5t's because it operates by first approximately nullifying Lt 
and then recovering St by solving a noisy CS problem, that 
enforces sparsity of the S't's. 

In a series of recent works 0, ID, 1241 . an elegant solution 
has been proposed, that does not require a two step outlier 
location detection/coiTection process and also does not throw 
out the entire vector. It redefines batch robust PCA as a 
problem of separating a low rank matrix, L [Li, . . . ,Lt], 
from a sparse matrix, S" := [5*1, . . . , 5t], using the data matrix, 
M := [A/i, . . . , Mt] ^ L + S. It was shown in Q that one 
can recover L and S exactly by solving 

min||Lj|, + Alls'!!! subject to L + S = M (4) 

where \\L\\^: is the sum of singular values of L while \\S\\i is 
the li norm of S seen as a long vector, provided that 

« the singular vectors of L are spread out enough (not 
sparse), 

« the support and signs of S are uniformly random (thus it 

is not low rank) and 
• the rank of L is sufficiently small for a given sparsity of 

S. 

This was called Principal Components' Pursuit (PCP). While 
PCP is an elegant idea, it has three important limitations. 
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• Most importantly, PCP relies on the fact that the matrix 
5* := [S*!, ... 5*4] is sparse but full rank. But when the 
S't's are correlated and have a large support size, S will 
also often be low rank. This is particularly easy to see for 
the case where the support sets, Tt, have large overlaps 
over time. For example, for the case of Fig [T] the rank 
of S is only 45 (while n ~ 100). A low rank S makes 
it impossible for PCP to separate S from L and hence 
neither is recovered correctly. Also see Fig. |3] 

• A related issue is that PCP requires the rank of L to be 
quite small for a given support size of St, \Tt\, e.g. see Q 
Table 1]. But, real videos can have a lot of background 
variations, e.g. see Sec. lVI-Dl causing the rank of L to be 
as much as 20% of the image size. Also, the foreground 
can have large sized and multiple moving objects, making 
\Tt\ also quite large. 

• In many applications, e.g. in surveillance, one would like 
to obtain the estimates on-the-fly and quickly as a new 
frame comes in, rather than in a batch fashion. 

Our proposed algorithms, ReProCS and ReProCS(modCS), 
address these drawbacks. Unlike PCP, (a) ReProCS does not 
need the S't's to be uncorrected, and (b) it can recover 
S't's with larger support sizes [see Figs. [T] [3] and Table [III. 
Moreover, our second solution, ReProCS(modCS), utilizes the 
correlation of the S't's to its advantage in order to success- 
fully recover 5t's with significantly larger support sizes than 
what ReProCS can [see Figs H g]. Finally, both ReProCS 
and ReProCS(modCS) are recursive methods and hence they 
provide real-times estimates. Of course, our work and in fact 
any recursive robust PCA method, e.g. ifTOl . ID, does need an 
initial estimate of the PC matrix which PCP or other batch 
methods, e.g. 0, do not need. In practice, as we explain later 
in Sec. |V] this is usually easy to obtain. 

The static version of our problem is also somewhat related 
to that of 1251, 1261, 127) in that all of these also try to cancel 
the "low rank" part by projecting the original data vector into 
the perpendicular space of the tall matrix that spans the "low 
rank" part. But the big difference is that in all these, this matrix 
is known. In our problem Pt is unknown and can change with 
time, and we also show how we can utilize the correlation of 
the outliers to our advantage. 

If U itself were known, then at any given time, our problem 
would be similar to the dense error correction problem studied 
in ifTTI . i28l . Of course the reason we need PCA is because 
U is unknown and cannot even be estimated. Only, the PCs, 
Pt :~ {U)Nt can be estimated. 

Some recent work which actually is completely different 
from our current work, but may appear related (since it also 
uses CS and PCA or eigenvalue decomposition) includes l|29l 
and Ha. 

II. Notation 

The set operations U, n and \ have the usual meanings. 
For any set T C {1, • • -n}, T'^ denotes its complement, i.e., 
T^- := {i e [1, - ■ -n] : i ^ T}, and |T| denotes its cardinality, 
i.e., the number of elements in T. But \a\ where a is a real 
number denotes the magnitude of a. 



For a vector v, denotes the zth entry of v and vt denotes 
a vector consisting of the entries of v indexed by T. We use 
||t;||p to denote the £p norm of v. The support of v, supp(w), 
is the set of indices at which v is nonzero, supp(?j) := {i : 
Vi ^ 0}. We say that v is s-sparse if |supp(?j)| < s. Sorting 
\vi\ in descending order, we define the p%-energy set of v as 
Tp := {\vi\ > k} where k is the largest value of \vi\ such that 
ll^'Tplli ^ P^l^'lli' i-S-' "^Tp contains the significantly nonzero 
elements of v. 

For a matrix A, Ai denotes the ith column of A and At 
denotes a matrix composed of the columns of A indexed by 
T. We use Ati^t2 to denote a submatrix of A consisting of 
the rows indexed by Ti and columns indexed by T2. We use 
A' to denote its transpose, and At xa denote its pseudoinverse. 
For a tall matrix A, A^ ~ {A' A)""^ A' . The Frobenius norm of 
matrix A is denoted by ||A||f, i-e., \\A\\f '■— y^i^JpijF- 

For a tall matrix P, we use span(P) to denote the subspace 
spanned by the column vectors of P. 

For a diagonal matrix D, Dt denotes a submatrix of D 
consisting of the rows and columns indexed by T. In other 
words, Dt is a diagonal matrix with {DT)j.j = {D)Tj,Tj. 
Also, diag(D) denotes the vector composed of the diagonal 
elements of D, i.e., (diag(_D))i = Di^i. 

We use to denote an empty set or an empty matrix. 

III. Recursive Projected Compressive Sensing 

We explain the Recursive Projected Compressive Sensing 
(ReProCS) algorithm below. In Sec. IIII-BI we discuss the 
implicit assumptions required for it. 

A. Recursive Projected Compressive Sensing (ReProCS) algo- 
rithm 

Let Pt be an estimate of the PC matrix Pi at time t and let 
P( ^ be an orthogonal complement of Pt. The column space 
of Pt^±_ is the null space of P/, and hence Pt^± is not unique. 
Using Pt and Pt.±, we can rewrite Lt = Ptctt + Pt.±Pt and 
hence Mt as 

Mt = Ptat + Pt,±fit + St 

where at PtLt is the projection of Lt into the subspace 
spanned by Pt', and Pt {Pt,!^)' Lt is the projection of Lt 
into the subspace spanned by Pt,i_. 

To approximately nullify the low rank part, Lt, we can 
project the data vector, Mt, into the space spanned by -Pt,j_; 
i.e. compute 

yt:=AtMt, where A* := (5) 

The dimension of the projected data vector, yt, reduces to n—r 
where r := rank(Pt). Notice that 

Vt = AtSt + Pt, where ft := AtLt = AtPt{xt)Nt (6) 

If Pt ~ Pt, then AtPt ~ 0, i.e. this nullifies most of the 
contribution of the low rank part, so that Pt can be interpreted 
as small "noise". Finding the n-dimensional sparse vector, 
St, from this n — r dimensional projected data vector, yt, 
now becomes the traditional noisy sparse reconstruction / 
compressive sensing (CS) iT4l . i2Tl . Il22l problem with the 
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Algorithm 1 ReProCS 

At t = to, suppose a good estimate of PC matrix, Pt„ is 
available from training data. For t > to, do the following: 

1) Let At At-i. Obtain yt by ©. 

2) Estimate St by solving © with e = ||P/_LLt_i|j2. 

3) Support thresholding and least square estimation: do (fTOl i 
and (Ell- 

4) Estimate Lt = Mt - St. 

5) Update Pt using Algorithm |4] which is based on |]4|- 
Update At := 

6) Increment i by 1 and go to step 1). 

In practice, one can replace (|7]i by (l8]l with e = ||(/ — 

AA')^t-ill2. 



"projected noise" /St resulting from the error in estimating Pt. 
As long as \\Pt\\2 is small and At does not nullify any nonzero 
elements of St, we can recover St by solving 

min||.s||i subject to \\yt — Ats\\2 < e (7) 

s 

with e chosen proportional to the "noise" level, |j/3t||2- Denote 
its output by St- In practice, for large scale problems where 
71 is large, a less computationally and memory intensive way 
than (|7]i is to solve 

min|ls|li subject to - PtP^){Mt - s)\\2 < e (8) 

s 

This is exactly equivalent to (|7]l because Pt^± is an orthonor- 
mal complement of Pt satisfying PtPt + Pt.±Pt ± = I and 

WPi^Mth = |iA,±A',±^^dl2. 

Using St, we can then estimate 

Lt = Mt - St (9) 

which can be used to recursively update the PC matrix 
estimate, Pt, every-so-often to prevent the "noise" Pt from 
getting large (recursive PCA). We explain how to do this in 
Sec.|V] 

In (|7]i, we need an appropriate parameter e which should be 
proportional to the "noise" term ||/3t||2- We set e adaptively as 
e = ||/3i-i||2 ||P;^^£t-i||2 at time t. 

If the constraint in (|7]i is too tight (e is too small), it will give 
a solution with too many nonzero values. Moreover, as first 
explained in |31], the solution of (|7]i is always biased towards 
zero due to minimizing the £i norm. To address both these 
issues, as suggested in ISTl , we can do support estimation 
followed by least squares (LS) estimation on the support, i.e., 
we can compute 

ft = {i: iSt)^>-/} (10) 
{St)f^={{At)fJ^yt, {St)f.=0 (11) 

The low rank part, Lt, can then be estimated using this new 
St and ©. 

The block diagram of the above approach is shown in Fig. 
|2(a)| and the stepwise algorithm is given Algorithm [l] 



B. Implicit Requirements** 

Suppose that \Tt\ < s for all t, i.e. all St's are s sparse. 
Clearly, a necessary condition for the CS step of ReProCS, i.e. 
d?), to recover the support of all S'f's correctly is that At — 
{Pt,±y does not nullify any s-sparse vector, and thus does 
not nullify any nonzero part of any St- Or, equivalently, no 
s-sparse vector belongs to span(Pt). We can use the null space 
property (NSP) 1321 to show that a slightly stronger version of 
this requirement also serves as a sufficient condition, at least 
in the noise-free case. 

Consider (|7]i- Assume that Pt = Pt so that the projected 
noise is indeed zero, i.e. Pt = 0. In this case, ^ with e = 
will exactly recover any s-sparse St if the following holds 
with a 9 <1: 

mT\\i<dmT4i 

for all sets T with \T\ < s and for all 77 e null{At) ES, ifTTll . 
Here null(At) := {77 : AtJ] = 0} refers to the null space of 
At- In words, we need that all s-sparse or s-approximately- 
sparse vectors (i.e. vectors for which ||(7/)t<^||i < II('7)t||i 
for some set T with |T| < s) do not lie in null(At). But 
null(Ai) ~ span(Pf ). Thus, a sufficient condition for ReProCS 
to exactly recover St in the noise-free case (pt = and e = 
in dTjl) is that no s-sparse or s-approximately-sparse vector lies 
in span(P4). 

We expect that this should hold when the columns of Pt are 
spread out enough (not sparse), which, in turn, should hold if 
the changes of the background, Lt, are not localized in one 
or more small regions, e.g. due to water waves' motion in the 
video application. We show an example in Sec. IVI-DI This 
observation will be analyzed in future work. 

Remark L We should note that we can also get a sufficient 
condition using the restricted isometry property (RIP) 1251 . 
II33I and in fact it would hold even in the noisy case, but it 
is not as illustrative. Let 5s be the s-RIP constant l25l for the 
matrix At- If the projected noise Pt satisfies \\Pt\\2 < e and 
if 52s < \/2 — 1, the reconstruction error is bounded by a 
constant times e, i.e. \\St — St\\2 < C((52s)e l33]| . 

IV. Support-predicted MoDiFiED-CS for ReProCS 
ReProCS needs to recover a |Tt|-sparse vector, St, from 
the projected data vector yt by solving a noisy CS problem 
(|7]i. The number of projected measurements available for the 
CS step ©is n ~ r, where r = rank(_P4). For a given r, 
if the support size of St, \Tt\, increases, or if r increases 
for a given sparsity level, \Tt\, the ratio may become 
too large for CS to estimate St accurately. In this section 
we show how to utilize the correlated support changes of the 
5t's to accurately recover them even when the ratio is 
too large for CS to work. We begin by first explaining the 
model on the support change of St- We then explain the 
support-predicted modified-CS algorithm, discuss why it is 
stable, and its extensions to more general cases. The complete 
ReProCS(modCS) algorithm is summarized in Algorithm |3] 

A. Model on the support change of St 

For explaining our ideas in a simple fashion, we consider 
a simple but realistic correlation model on St inspired by the 
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Pt 
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Pt-i 



KF update 



(a) Recursive Projected Compressive Sensing (b) support-predicted modified-CS 

Fig. 2: Recursive Projected Compressive Sensing and support-predicted modified-CS 



video application. We assume S't is a ID foreground image 
with one moving foreground object that satisfies a constant 
velocity model described below. The extension to 2D images 
with multiple moving objects or to other correlation models 
is simple and is explained in Sec. IIV-DI 

Let Pt be the location of the foreground object's centroid at 
time t, let vt denote its velocity, and let w denote its width. 
Thus, its support is. 



Tt = [pt ~'W,pt + w] 



(12) 



Let 



9t 



and G := 



1 1 
1 



We assume the standard constant velocity model with small 
random acceleration on the object's motion lf34l Example 
V.B.2], i.e., 

gt = Ggt-i + 



(13) 



The modeling error (acceleration), rit, is assumed to be 
bounded, with zero mean and variance Q. 

B. Support-predicted modified-CS (supp-pred-modCS) 

The main idea of supp-pred-modCS is as follows. We use 
the above model in a Kalman filter (KF) to track the object's 
motion over time. The KF predicted location of the object and 
its size tells us its predicted support at the current time. This 
is then used to solve the modified-CS (modCS) problem and 
obtain an updated support estimate. The centroid (or median) 
of this support estimate tells us the observed location of the 
object, which may be erroneous because our support estimate 
is not perfect. This then serves as the noisy observation for 
the KF update step to update the current location and velocity 
estimates. We now explain each of these four steps. 

Predict Location: 

Let gt^z = Vt\z]' denote the estimate of gt at time t 
given measurements up to and including at time z. Similar rule 
applies for T^i^. Let l^t\t-i^ ^t\t ™d Kt denote the prediction 
and updated error covariance matrices and the Kalman gain 
used by the KF. Compute 



gt\t-i = G .9t-i|t-i, and 
'^t\t-i = G St_i|t_i G' + 



Q 



(14) 
(15) 



Algorithm 2 Support-predicted modified-CS 



1) Predict centroid by dUli and ( fTSl l 

2) Predict support by ( [20] i 

3) Update support 

- Modified-CS: solve (EB with T 

\\P't.^Lt-ih. 

- Add-LS-Del procedure: 



Tt and e 



T;dd = TU{ier^: >aadd} (16) 

(sk.. = ((AOT„.J^yt, =0 (17) 

Tt\t^T.^M \ e T^M ■ < "del} (18) 

{St)f^^^={{At)fJ^yt, {St)f.^^=0 (19) 

4) Update centroid by ((25) and ( |26] l. 

In practice, one can replace ( ISTT i by 

min, llsT-lli subject to \\{I - PtPi){Mt - s)\\2 < e 
with e= ||(J- AA')Vi||2 



Predict Support: 

Using the location prediction from above, compute the 
support prediction as 



t-l = [Pt\t-i - W,Pt\t-i +w] 



(20) 



Recover St and Update its Support using Modified-CS: 

Assuming T ~ Tt\t-i is a good support prediction, we 
can use it as the partial support knowledge for modCS which 
solves 



ST-lli subject to \\yt - Ats\\2 < e 



(21) 



i.e., it tries to find the solution that is sparsest outside the 
set T among all solutions that satisfy that data constraint. Let 
s be the solution of i2T[ with T = Tt^t-i- As explained in 
ll35l . s is biased towards zero along (because of the £i 
term) and it may be biased away from zero along T (there is 
no cost on st and the only constraint is the data constraint). 
The elements which are missing from the support prediction, 
A(|f_i := Tt\T are a subset of where as the extra elements 
in the support prediction, A^^t\t-i T\Tt are a subset of T. 
If we use a single threshold for support estimation as in ( fTOl i. 
we will need a small threshold to ensure that most elements of 
At\t-i get correctly detected, but a large one will be needed 
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Algorithm 3 ReProCS(modCS) 

At t = to, suppose a good estimate of PC matrix, Pt„ is 
available from training data. For t > t^, do the following: 

1) Let At At-i. Obtain yt by dSj. 

2) Estimate St by Algorithmic 

3) Estimate Lt = Mt - St- 

4) Update Pt using Algorithm |4] which is based on Q. 
Update At := 

5) Increment t by 1 and go to step 1). 



to ensure that elements of Ag t\t~i ^6 deleted. Thus for a 
given threshold, one or the other cannot be done well. 

A better approach is to use the Add-LS-Del procedure 
summarized in step 3 of Algorithmic This was first introduced 
in our older work |[T9|| , 1361 and simultaneously also in ifTSl , 
lfT6l . The addition step threshold, a^dd, needs to be just large 
enough to ensure that the matrix used for LS estimation, At.^^^^ 
is well-conditioned. If a-^^id is chosen properly, the LS estimate 
on Tadd will have smaller error than the modCS output. As a 
result, deletion will be more accurate when done using this 
estimate. This also means that one can use a larger ajei to 
ensure quicker deletion of extras. We denote the final support 
estimate output of the Add-LS-Del procedure by T(|t. 

Update Location: 

The centroid of the updated support estimate, Tt\t, obtained 
above, serves as the "observed" location for the KF, pt_obs, i-e. 

Pt.obs = centroid(ft|t) := -J— ^ i (22) 

The following is a valid model for pt obs 

Puohs = Hgt+ujt, H :^ [1 0] (23) 

where LOt is the observation error, which is bounded, zero 
mean and has a variance R. The observation error arises 
because there are extras and misses in Tt\t hence ft,obs = 
centroid(T't|t) ^ centroid(Tt) = pt- In practice, using the 
median instead of centroid provides a better observed estimate 
especially when there are many extras in the support estimate. 
We discuss how to set R in Sec. IIV-DI 

The KF update step is as follows 1341 Chapter 5]. 

Kt = H' {H ^t\t-i H' + (24) 

9t\t = 9t\t-i+ Kt {pt^obs - H gtit-i) (25) 
St|t=St|t-i- A't (26) 

The above algorithm is summarized in Algorithm |2] and 
also in the block diagram of Fig |2(b)| A complete algorithm 
incorporating the idea of supp-pred-modCS into ReProCS is 
given in Algorithmic] 

C. Stability: Main Ideas** 

The ReProCS algorithm is a recursive approach. In 
ReProCS(modCS), the sparse recovery algorithm, support- 
predicted modified-CS, is itself recursive. Hence an important 



question is when and why will it be stable (error bounded 
by a time-invariant and small value)? Our simulations given 
in Sec. |Vl] do indicate that it is stable. In this section, we 
intended to provide the main arguments of why this can be 
shown analytically. Due to lack of space, we have moved 
this discussion to Supplementary Material. Based on reviewer 
comments, it can later be incorporated into the main paper by 
shortening other sections. 

D. Discussion and Extensions** 

It is easy to see that, if pt is the centroid of Tt\t, then the ob- 
servation error can be bounded as \uJt \ < l^t|t| 2u)+i-|At|t| 

I maXjEA ,|, |i-Pt| , .... , ... ,. 

|Ae^t|t| ^ . We show this m the stability discus- 
sion (see Supplementary Material). Denote this bound by B. 
A possible way to set the variance, R, of the observation error 
ujt is to assume that it has the maximum variance distribution 
for a given bound (i.e. is uniformly distributed). Using this, 
R = But doing this will require roughly knowing 
the final number of misses and extras. This will depend on 
the intensity distribution of the moving object and of the 
background. It can be estimated only if a training sequence 
with the same object is available. But even without this, the 
following qualitative fact always holds. If the background 
and foreground values are quite different, the updated support 
estimate will be more accurate. So, clearly ujt will be small. 
This can also be seen from the bound on |cJt| which is directly 
proportional to the updated support errors. In this case, R 
should be smaller. In general, R should be of the same order 
as Q if we want the KF to compute a weighted average of 
the new observed location and the predicted one. The smaller 
(larger) the value of R compared to Q, the greater (lesser) the 
KF depends on the observed location (centroid/median of the 
updated support). 

For the sake of simplicity, we presented our idea of supp- 
pred-modCS for a ID image sequence using a very simple 
correlation model on the support change of the sparse part St- 
The sparse part was assumed to contain a single translating 
object. However, we can extend it to other more general cases. 
The extension to 2D image sequences is easy. Position and 
velocity will each be a two dimensional vector The support 
prediction and the location update steps will need simple 
changes to handle 2D motion. The extension to multiple ob- 
jects is also easy if they have sufficiently different intensities; 
if their supports do not overlap for too long; and if their motion 
is independent. One can have a separate KF for tracking the 
motion of each object. To obtain the observed locations of 
the different objects one can use intensity thresholds to obtain 
their respective supports. This is done in our experiment of 
Sec. rVLO 

The case of multiple and changing number of objects; 
objects that can enter and leave the scene; and objects that 
cannot be separated by intensity thresholding but need more 
sophisticated segmentation techniques will be studied in future 
work. Also, in the current work we only assume translating 
objects. The extension to any other linear model (e.g. affine 
deformation model in case of video or any other linear model 
in other applications) is easy. Finally, so far we have not used 
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the fact that the support will consist of one or a few contiguous 
blocks in the support update step. Using this can significantly 
help remove arbitrary extras. One way to use this fact is to 
use the model-based CS idea of ifTSl. 



Algorithm 4 Recursive PCA 



V. Recursive PCA** 

As explained in Sec. lI-AI Lt = Uxt where U is an unknown 
orthonormal matrix and xt is a sparse vector The support set 
of Xt, Nf, changes every-so-often and the nonzero elements 
of Xt are spatially uncorrected. Thus, Pt also changes every- 
so-often. One realistic generative model on xt and hence on 
Lt is given in the Appendix. When new PCs appear, we 
need to detect them timely before the projected noise, f3t, 
seen by CS gets too large. When some old PCs vanish, we 
also need to remove them from Pt. Otherwise, r = rank(Pt) 
will keep increasing and therefore the number of projected 
measurements, n — r, will keep decreasing and may become 
too small for CS or modCS to work. 

At the initial time, t = to, if the training sequence, 
[Ml,-- - ,Mtg], does not contain any sparse part, we let 
[Li, - - - , Ltg] = [Ml, - - - , MffJ. Usually such a sequence is 
easy to obtain, e.g. in video this means having a sequence with 
no foreground moving objects. If [Mi, - - - , Mt„] does contain 
the sparse part, but its support is small enough or uncorrected 
enough so that PCP works, then [Li, - - - , Ltg] can be obtained 
by solving PCP i.e. ©, with M = [Mi, - - - , MtJ. 

The initial PC matrix, Pt„, is estimated by computing a 
singular value decomposition (SVD) of [Li, - - - , Ltg], and re- 
taining singular values above a threshold ao- i-C-, we compute 
[Li, - - - , Ltg] PAV; set T ^ {i : (A)^,, > uq} and set 
Ptg ^ Pt, At„ -s— At.t, Vtg -f- Vt- Thus, all but the singular 
values above ao are zeroed to give a truncated SVD that 
approximates the data. Usually, ao is picked according to the 
distribution of the singular values. If the matrix [Li, - - - , Ltg] 
is exactly low rank, ao can zero. Otherwise it can be picked 
so that T is the p%-energy set of diag(A) for p large enough. 

We use Vt to only explain how we update Pt, but we do not 
need to compute and store Vt- 

For t > to, we obtain Lt by ReProCS. In the recursive PCA 
step of ReProCS, we update Pt every r frames. Alternatively, 
we can also do this whenever e = ||(/ — Pt-iPt_i)Lt-i\\2 
exceeds a threshold. 

The PC update is done as follows. The complete algorithm 
is summarized in Algorithm |4] At t = to + kr, let D = 
[Lt-T+i,'' ■ ,Lt]. In step 2a), we first estimate the variance 
of D along the columns of Pt-i and remove the PCs along 
which the variance is below a threshold a. Let CTmin be the 
smallest nonzero singular value of Af„. We use a = 0.5cr^i„. 
Note that after doing step 2a), the column vectors of Pf-i 
contain all the non-decayed PCs. 

In step 2b), we rotate Pt-i and find the new PCs based 
on the idea of incremental SVD l?). We first decompose the 
new collected data D into two components C and E, where 
C = Pl-iD and E = D — Pt-iC. The parallel component 
C rotates the existing singular vectors and the orthogonal 
component E estimates the new PCs ID. Let E JK 
be a QR decomposition of E. Notice that Pt-i and J are 



Initialization: Compute [Li, - - - ,Lta] PKV; set T 
{i : (A),,., > ao}, Ao ^ Pt, Kg ^ At^t- 
Let = 0. At each time t > to, store Lt in D, i.e., D 
[D Lt]. Do the following: 

1) If there are less than r frames in D, 

- Let Pt ^ Pt-i, At ^ At-i. 

2) If there are t frames in D 

2a) Remove decayed directions from Pt-i 

±£ = -PUDD'Pt-1 

T 

N£^{i: (Ee),,, > a} 
Pt-i'^ {Pt-i)f]^, At_i (At_i)^^_^^ 
ft-i ^rank(Pt-i) 

2b) Update Pt and A* by incremental SVD 

- Compute C = P^^iD and E = D - Pt-iC 



QR 

- Compute QR decomposition of E: E = JK 



- Compute SVD of 

PrArVl 



At^iC 
K 



At^iC 
K 



SVD 



- Update Pt and At as 

Pt ^ [Pt-l J]Pr, At ^ Ar 

2c) Add new PCs with large variance 



Nt = {!,■■■ ,ft-i}^{i>ft~i 
Pt^{Pt)M^, At^{At)^^ 



(AO 



> a) 



2e) Reset D 



orthogonal, i.e., Pl^iJ = 0. The column vectors of J are the 
basis vectors of the subspace spanned by the new PCs. It is 
easy to see that H 



[Pt-iAt-iVt'-i D] = [Pt-i J] 



'At-i C 




'Vt-i 0" 


K 




/ 



Let 



At-iC 
K 



PrArV;. Clearly, 



[Pt-lAt-lVU D]''^"" {[Pt^i J]Pr)Ar{ 



14-1 
/ 



VrY 



In words, Pr rotates the old PCs, Pt-i, and the new basis 
vectors, J, to the current PCs, i.e., Pt = [A-i J]Pr- Also, the 
singular values along Pt are the diagonal elements of At = A,.. 

Let ft_i = rank(_Pt-i) and let Na = {n-i + 
1, - - - ,rank(A)}- If A-i « Pt-i, the old PCs are already 
correctly estimated and do not need to be rotated. Under this 
assumption, {Pt)ff^ contains the new PCs. The variance of D 
along the columns of (Pt) jy^ is given by the diagonal elements 
of ^(At)^ . Therefore, to only retain the new PCs with large 
variance, we threshold on 7(Af)^ in step 2c). 
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Recall that n — r (where r = rank(Pt)) is the number of 
projected measurements and f3t is the noise seen by CS or 
modified-CS (see (|6]l). By choosing different values of the 
threshold, a, there is a tradeoff between making /3t small 
and keeping n — r large. A smaller a means we retain more 
directions in Pt. This means that n — ?- is smaller (fewer 
measurements) but the projected noise, (3t = (I — PtPDLt 
is also smaller When the nonzero elements of the sparse part, 
St, have very small magnitudes, we need a smaller a to ensure 
that the noise, Pt, is sufficiently small compared to 5*4. 



VI. Experimental Results 

We compare the performance of ReProCS and Re- 
ProCS(modCSil] with two recursive robust PCA methods - 
incremental robust subspace learning (iRSL) ifTOl . adapted 
(outlier-detection enabled) incremental SVD (adapted-iSVD) 
m, and two batch robust PCA methods - principal compo- 
nents' pursuit (PCPjl j7|, robust subspace learning (RSLjl IS). 
We also show a comparison with simple thresholding based 
background subtraction (BS) for the last experiment. 

In Sec. IVI-AI and Table U we compare ReProCS with 
iRSLQUl, adapted-iSVDit?!, PCP fT\ and RSL f6l. As can 
be seen, iRSL and adapted-iSVD fail in case of both small 
magnitude S'f's and large support-sized St^- RSL can handle 
large support size, but not small magnitude S'f's while the 
opposite is true for PCP. In later comparisons we only compare 
against PCP and RSL. In Sec. [VLBl Table M and Fig. |3] we 
give a detailed comparison of ReProCS and PCP. PCP has 
large error whenever the support sets of the S'f 's are correlated 
and their sizes are large. Comparison with RSL is also shown. 

In Sec. IVI-CI and Fig. |4l we show an example where 
ReProCS does not work and ReProCS (modCS) is needed. 
Here the support size of St, \Tt\ ~ 0.5 In while the number 
of projected measurements is only about n — r k, 0.8n. 

In Sec. IVI-DI we show comparisons on a partly-real video 
sequence where we overlay simulated sparse foreground im- 
ages on real background images. The background sequence 
has nonzero mean. The background variations, e.g. those due 
to water waves, lead to the low-rank part having covariance 
matrix rank as large as 20% of image size. Because the 
variations due to the water waves are not localized in one 
or more small image regions, the resulting projection matrix. 
At = {Pt.±y, is "incoherent" enough compared to the sparse 
foreground image and so CS and modCS work. 

All our code will be posted on the first author's webpage, 
\http://www. ece. iastate. edu/^chenlu\ Videos of all the above 
experiments and some results on fully real video sequences 
are also posted here. 

'we use YALLl ii minimization toolbox (37) to solve and j2U . Its 
code is available at http://yalll.blogs.rice.edu/ 

^We use Accelerated Proximal Gradient algorithm l381 and Inexact ALM 
algorithm 1391 (designed for lar ge scale problems) to solve PCP j4). The co de 
is available at http:/ /perception.csl.uiuc.edu/matrix-raiik/sample_code.htmlj 

^The code of RSL is available at 

http://www.salleurl.edu/ ftorre/papers/rpca/rpca.zip. 



A. Comparison of ReProCS with adapted-iSVD, iRSL, RSL 
and PCP 

In this experiment, the measurement at time t, Mt :~ 
Lt + St, is an n X 1 vector with n = 100. The low rank 
part, Lt — Uxt, where [/ is an n x n orthonormal matrix 
(generated by first generating an n x n matrix with entries 
randomly sampled from a standard Gaussian distribution, and 
then orthonormalizing it using the Gram-Schimidt process) 
and is an 71 X 1 sparse vector with support set Nt 
with size \Nt\ ~ 0.2n. Thus the PC matrix at time t is 
Pt := (J7)wf The sparse vector, Xt, is simulated using the 
regression model described in the Appendix with / = 0.5, 
fd = 0.1, and 9 = 0.5. Initially there are 20 PCs with 
variances lO", 0.7079 x 10^,0.70792 x 10^, • • • ,14.13. At 
t = to + 5, two new PCs with variance 50 and 60 enter the PC 
basis, Pt, and the values of xt along two old PCs start to decay 
exponential to zero. The sparse part, S't, is an n x 1 vector 
with its support set denoted by Tt- For I < t < to — 2000, 
St = and hence Mt = Lt- For t > to, the support set of 
St, Tt, is generated in a correlated fashion: St contains either 
one or four nonzero strips (small and large support size cases). 
Each strip has 9 nonzero elements and it can either stay static 
with a large probability 0.8 or move top/down with a small 
probability 0.1 independently of the others. Thus the support 
sets are both spatially and temporally highly correlated. The 
magnitude of the nonzero elements of St is fixed at either 100 
(large) or 10 (small). The two cases are plotted in Fig. [T] 

We tabulate our results in Table H] The first to frames can 
be used to obtain the initial PC matrix, Pt^, by standard PCA 
on ,Mto] ~ [-^i,''' i^to]- The same Pt^ is used 

by ReProCS (Algorithm [T]) and by the two other recursive 
algorithms - iRSL HOI and adapted-iSVD 14]. For ReProCS, 
the support estimation threshold in dTol ). 7, can be set a little 
lower than the minimum nonzero magnitude that we would 
like to correctly detect. Thus we set 7 = amini^Tt \{St)i \ for 
an a < 1. We used a ~ 0.2 in the large magnitude St case, 
but a = 0.3 in the small magnitude St case (using a larger 
a in this case ensures that 7 is greater than the noise level). 
We update Pt for every t = 20 frames by Algorithm 2] with 
a = 5. iRSL flOl solves the recursive robust PCA problem 
by weighting each data point according to its reliability (thus 
soft-detecting and down-weighting likely outliers). If the final 
goal is outlier detection (e.g. detecting foreground moving 
objects), this is done in lITOI by thresholding on the difference 
between the current data vector and its projection into the 
PCA space, i.e. by thresholding on (/ — PtP^)Mt. We also 
compare against an adapted version of |4l (we call it adapted- 
iSVD). We provide adapted-iSVD the outlier locations also 
by thresholding on (/ — PtP[)Mt. It fills in the corrupted 
locations of Lt by imposing that Lt lies in span(Pt). We used 
a threshold of 0.5minigTt |(5't)i| for both iRSL and adapted- 
iSVD (we also tried various other options for thresholds but 
with similar results). PCP fTl is a batch method that needs 
to wait until all data frames AI = [AIi , • • • , Mt] are available 
and then estimates the low rank part L = [Li, • • • , Lt] and the 
sparse part S = [6*1, •• • ,St] simultaneously by solving the 
convex optimization problem of (|4]i with A = 1/ ■y/max(n, t) 
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as suggested in Q. RSL ||6| is another state-of-art batch 
robust PCA method that aims at recovering a good low-rank 
approximation that best fits the majority of the data. RSL 
solves a nonconvex optimization via alternative minimization 
based on the idea of soft-detecting and down-weighting the 
outliers. For RSL and PCP, St is not exactly sparse and 
therefore we estimate the support, Tt, by thresholding on St 
with a threshold picked in the same way for adapted-iSVD 
and iRSL. 

iRSL was designed for a video application and hence it di- 
rectly detects the foreground overlay's support by thresholding 
on (/ — PtPt)Mt (it never computes St). Thus, in this exper- 
iment, for all algorithms, we compare the reconstruction error 
of the foreground overlay, {Ot)Tt = {Mt)Tt and {Ot)T'' = 0, 
where Tt is the support set of St- In Table HI we compare the 
normalized mean square error (MSB), E||0 - d|||/E||0|||, 
where O := [Otg+i,--- ,Ot„+ioa]- The expected value is 
computed using 100 times Monte Carlo averaging. 

As can be seen, ReProCS gives small error in all four 
cases. ReProCS is able to successfully recover both small 
magnitude and fairly large support-sized 5f's because it op- 
erates by first approximately nulUfying Lt, i.e. computing 
yt := (/ — PtPl)Mt, and then recovering St by solving 
(|7]i that enforces the sparsity of St- Adapted-iSVD and iRSL 
also approximately nullify Lt by computing the same yt, 
but they directly use yt to detect or soft-detect (and down- 
weight) the support of St- Using ([T]i, yt can be rewritten as 
yt = St + i-PtPlSt) + I3t where Pt (/ - PtPl)Lt- As the 
support of St increases, the interference due to {—PtPlSt) 
becomes large, resulting in wrong estimates of 5*4. Since 
adapted-iSVD and iRSL are recursive methods, this, in turn, 
results in wrong PC matrix updates, thus also causing Pt to 
become large and finally causing the error to blow up. Using 
yt directly to detect the support of St also becomes difficult 
when the magnitude of the nonzero 5f's is small compared to 
I3t and this is another situation where adapted-iSVD and iRSL 
fail. RSL is able to handle larger support size of StS because 
it uses the entire sequence of to + 100 A/j's jointly and the 
first <o = 2000 MtS have St = 0. But when the magnitude 
of the nonzero St's is small, RSL also fails. PCP is able to 
always deal with small support sized 5't's (even when their 
magnitude is small). But it fails for large support sizes of St's, 
particularly since they are also correlated. PCP is much more 
robust to independently generated 5f's with similar ratios of 
\Tt\/{n — r) (shown next). 

B. Recovery of random or structured sparse outliers 

In this experiment, we consider the problem of track- 
ing moving blocks in a simulated image sequence of size 
32 X 32 X (to + 200). Thus, the data vector at time t, Mt, 
is n = 32^ = 1024 dimensional. Note that Mt = Lt + St with 
Lt and St generated as described below. We let Lt = Uxt 
where U is an orthonormal n x n matrix generated as before 
and xt is an n dimensional sparse vector The sparse vector, 
Xt, is simulated using the regression model described in the 
Appendix with / = 0.5, fd = 0.1, and = 0.5. In Table 
IHI(a)| and Table liy(b)] initially there are 32 PCs with variances 



10^ 0.80 58 X 1 0^ 0.8058^ x 10^ • • • ,12.4. In Tabljg(c)l 
and Tabl dm(d)] initially there are 128 PCs with variances 
10^, 0.9471 X 10^, 0.9471^ x lO'', • • • , 10. In all four cases, 
at time t = to + 5, two new PCs with variance 50 and 60 
enter the PCs basis and the value of xt along two old PCs 
begins to decay exponentially to zero. At time t ~ to + 50, 
another two new PCs with variance 55 and 65 enter the PCs 
basis and the value of xt along two old PCs begins to decay 
exponentially to zero. For 1 < t < to, St = and hence 
Mt = Lt- For t > to, St has nonzero elements. Each nonzero 
element of St has constant value 5, which is much smaller 
than those of it's. We generate the support of St, Tt, in the 
following two different ways: 

• In Table im(a)| and Table im(c)[ for a given support size 

\Tt\, Tt is generated uniformly at random. 
. In Table |IJ(b)] and Table Tt is spatially and 

temporal correlated as described below. At each time 
t > to, there are several 7x7 blocks having constant 
pixel value 5. All other entries in St are zero. Each 
block can either stay static with probability 0.8 or move 
independently one pixel step towards top/bottom/left/right 
with probability 0.05. 

For ReProCS, we use the first to = 10^ frames, 
[Ml, • • • Mt„] — [Li, ■ ■ ■ Lt„], as training data to get an initial 
PCs' estimate Pt„ via computing its SVD. For t > to, we do 
Algorithm [U to recursively estimate Lt and St- The support 
estimation threshold in ( fTol i. 7, can be set a little lower than the 
minimum nonzero magnitude that we would like to correctly 
detect. We set 7 = 0.2mini£Tt \iSt)i\ = 1- We update Pt 
for every r = 20 frames by Algorithm |4] with a = 5. 
The reconstruction time of ReProCS is about 0.015 seconds 
per frame. To make a fair comparison, we use the entire 
sequence, [Mi, ■ ■ ■ Mto+200], to do PCPfTl and RSL116]|. The 
reconstruction time of offline PCP, (total time)/(to + 200), is 
about 0.05 seconds per fram^ The reconstruction time of 
offline RSL, (total time)/(to + 200), is about 0.03 seconds 
per frame. But if we had to do PCP or RSL for a causal 
reconstruction, i.e., at each time t, estimate St and Lt by 
solving PCP or RSL using all available data [Mi, ■ ■ ■ , Mt], 
PCP will take about 500 seconds and RSL will take about 
300 seconds at each time. 

We summarize the reconstruction errors of PCP and Re- 
ProCS in Table nil based on 100 times Monte Carlo averaging. 
The normalized MSE E(||L - Li||,)/E(||S'|||,) and E{[[S - 
S[[%)/E{[[S[[p) are computed only for the last 200 frames. As 
can be seen from Table [III PCP has large reconstruction error 
for less sparse and/or correlated StS- However, ReProCS can 
recursively recover Lt and St accurately, with reconstruction 
error less than 10~^ in all cases. In Fig. |3(a)| and Fig. |3(b)[ we 
plot the normaUzed MSE (NMSE) of Lt and St for the third 
case in Table [Ig(d)] where 1^ « 28.71%. We also show the 
error of RSL. Since the nonzero elements of St have much 
smaller magnitude than the elements of Lt, RSL also does not 
work. In Fig. |3(c)| we show Sto+2oa obtained by ReProCS, 
PCP RSL [61 as an image. 

''PCP using the last 200 frames takes about 0.02 seconds per frame but it 
gives even larger errors for Lt and St 
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(a) (St)^ = 100, Vi g Tt and (St)^ = 0,Vj S Tf 



(b) {St)i = 10, Vi e Tt and = 0,Vj e T^^ 



l^tl/n 


ReProCS 


adapted-iSVD 


iRSL 


RSL 


PCP 


9% 


0.00002 


0.0381 


0.9817 


0.0001 





36% 


0.0119 


0.3172 


0.9813 


0.0445 


0.5553 



\Tt\/n 


ReProCS 


adapted-iSVD 


iRSL 


RSL 


PCP 


9% 


0.0480 


0.1949 


0.9890 


4.9618 


0.0946 


36% 


0.0563 


0.4043 


0.9920 


1.2482 


0.5786 



TABLE I: Normalized MSE E||0 - 0|||,/E||0|||, of various metliods. The experiment is described in Sec. lyFAl Here, \Tt\/n is the 
sparsity ratio of St- 

(a) rank(L) = 36, supp(5) is uniformly random 



(b) rank(L) = 36, supp(S') is correlated 



\Tt\ 


PCP 


ReProCS 




Tt! 


PCP 


ReProCS 


n 


EWL-L^p 


E||S-S||J, 


E||L-L||J, 


EIIS-SIIJ. 




n 


E||L-L||J. 


E|jS-S||J. 


E||i-i|| J. 


EIIS-SIIJ, 




n\L\\'i 


nswj. 


E\\L\\l 


EWSWj, 






E||i||^p 


E\]S\\'j, 


EWLWj, 


E||S|pp 


9.57% 


1.42 X 10"* 


3.0 X 10-3 


1.32 X 10-* 


2.8 X 10-3 




9.57% 


9.6 X 10-* 


0.022 


2.80 X 10-* 


6.4 X 10-3 


19.14% 


5.58 X 10-* 


5.8 X 10-3 


1.81 X 10-* 


1.9 X 10-3 




19.14% 


2.76 X 10-2 


0.333 


4.75 X 10-* 


5.7 X 10-3 


28.71% 


1.80 X 10-3 


1.25 X 10-2 


2.42 X 10-* 


1.7 X 10-3 




28.71% 


5.81 X 10-2 


0.502 


6.52 X 10-* 


5.6 X 10-3 




(c) rank(L) = 


132, supp(5) is 


uniformly random 




(d) rank(L) 


= 132, supp(S') is correlated 




\Tt] 


PCP 


ReProCS 




\Tt\ 


PCP 


ReProCS 


n 


E\\L-L\\%, 


E||S-S|||. 


E||L-f,|| 1 


EIIS-Slll 




n 


E||L-I,|||, 


E||S-S||J. 


E||i-I,|| J, 


E||S-S|||, 








EllilPp 


E||S||^p 






E||i|| p 


nswj. 


EWLWj, 


E||S||^p 


9.57% 


1.69 X 10-* 


1.32 X 10-2 


4.98 X 10-5 


3.9 X 10-3 




9.57% 


4.61 X 10-* 


0.04 


1.03 X 10-* 


9.1 X 10-3 


19.14% 


6.55 X 10-* 


2.55 X 10-2 


6.89 X 10-5 


2.7 X 10-3 




19.14% 


7.5 X 10-3 


0.35 


1.74 X 10-* 


8.1 X 10-3 


28.71% 


1.70 X 10-3 


4.33 X 10-2 


1.05 X 10-* 


2.7 X 10-3 




28.71% 


1.64 X 10-2 


0.53 


2.47 X 10-* 


8.1 X 10-3 



TABLE IT Comparison of reconstruction errors of PCPQ and ReProCS for separating low rank L and sparse 5*. Here, L and S are of 
size n x 200 and n = 1024; the rank of L, rank(L), is much smaller than n; and the support of S is generated either uniformly at random 
or correlated. Note that \Tt \ is the number of nonzero elements in St- 



We also verified that ReProCS keeps tracking the change of 
PCs gradually and correctly. At t = to, we have a good Ptg 
that finds all the existing PCs. At t ~ to + b, let Ui denote the 
two new PCs that get added and let U2 denote the two old PCs 
that start to be removed from span(Pf). At t = to, we have 
^fj^ = (Ui is not in span(A)) and "[^^^.f^ = 1 (U2 is 
in span(Pf)). We updated Pt for every t = 20 frames using 
Algorithm m with a = O.Scr^. =5. Alt = to + 200, ^-WjM^ 

II ^1 IIf 

II P' Uo 

increased from to 0.91 and decreased from 1 to 

111.^211 p 

0.11. Therefore, new PCs gradually and correctly got added 
to span(Pt) and the decayed PCs gradually got removed from 
span(Pt). As explained in Sec.|V] we can remove the decayed 
PCs more quickly by using a larger threshold a. 

C. Using correlation of the sparse part: the need for supp- 
pred-modCS and ReProCS (modCS) 

In this section, we show an example where ReProCS 
(modCS) is needed. The image sequence is of size 64 x 80 x 
(to + 100) with to = 5000. Thus, the measurement at time t, 
Mt, is of length n = 64 x 80 = 5120. We generate Mt = 
Lt + St as follows. We set Lt ~ Uxt where U is generated as 
before and xt follows the regression model of the Appendix 
with / = 0.5, fd = 0.1, and 9 = 0.5. Initially, there are 0.2n 
PCs with variance lO", 0.9933 x 10^, 0.99333 ^ io4, . . . , 10. 
At t = to + 5, two new PCs with variance 50 and 55 get 
added and the value of xt along two old PCs starts to decay 
to zero. Compared to the previous experiment, the rank of the 
PC matrix, r = rank(Pt) w 0.2?!, is larger. For 1 < t < to, 
St = and hence Mt = Lt- For t > to, there are two 
45 X 29 moving objects in St- Thus, the support size of St 
is \Tt \ ~ 0.51n while the number of projected measurements, 
yt, is ?i — r w 0.8n. The two blocks move independently 
following the motion model ( fT3] l where rit is a zero mean 
truncated Gaussian noise with variance Q = 2.5 x 10-'', 



i.e., nt - A/'(0,Q) and < \nt\ < The first 

block has constant pixel intensity 10 and it moves from left 
to right with initial velocity vt„+i — 0.25. The second block 
has constant intensity 20 and it moves from right to left with 
vtg+i = —0.25. We use a small Q because we want the blocks 
to stay in the scene for a long sequence (in the next section, 
we use a larger Q). 

For ReProCS(modCS), at time t = to + 1, we start a 
separate KF for tracking the motion of each object with its true 
location and velocity, i.e. we use gt„+i\to = bto+i^ vto+i], 
T,to+i\to = for sach object. The deletion threshold, add, 
can be set a little lower than the minimum nonzero mag- 
nitude that we would like to correctly detect. Thus we set 
c^dei = a minigjVt | {St)i \ ~ 10a for an a < 1. We used a = 0.1 
and thus set ajei = 1- The addition threshold usually can 
be much lower to ensure most elements get correctly added. 
This can either be indirectly set (we can keep adding more 
elements to Tadd until the condition number of {At)T.^M 8°^^ 
below a "well-conditioned" threshold) or we can just set it to 
a percentage of ctdei- In this work we used this latter approach 
and set ctadd = 0.5Q;dei- To obtain the observed locations 
of the different objects, we first use intensity thresholds to 
obtain their respective supports. The observed location of each 
block is then obtained as the median of its estimated support, 
which is more robust than ( |22] ) if there are occasionally some 
extra indices far away from the true support set. We use 
R = 4Q = lO-'' as the variance of the observation noise, Wt, 
in ( |23l ). We use a larger R because the intensities of the moving 
objects are small compared to the background and hence the 
support update error is likely to be larger. For both ReProCS 
and ReProCS(modCS), we update Pt for every r = 20 frames 
with a = 0.5(Tj^;^ = 5. For ReProCS, we use 7 = 1 for support 
thresholding in (fTOl ). 

In practice, foreground objects slowly enter the scene and 
the support size of St, \Tt\, increases over time (until they 
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(a) Normalized MSE of Lt from to + 1 to to + 200 
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(c) Recovery results at t = to + 200. The original observed image is Mt = Lt + St- 
Fig. 3: Normalized MSE plots and reconstruction results for the third case in Table [in(d)l where ^ ^ 28.71% 
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(c) Recovered image St at t = to + 100 
Fig. 4: Comparison of ReProCS and ReProCS(modCS) when limited projected measurements are available. Here, \Tt \ = |supp(S'f ) [ ~ 0.51n 
while the number of projected measures is about n — rank(Pt) ~ 0.8n. 



enter the scene completely). Therefore, \Tt\ shall be small 
for the initial frames. Even though we may not have the 
initial location/velocity knowledge, we can do ReProCS for 
the first a few frames, get initial estimates of the support and 
then use intensity based segmentation followed by centroid 
computation to estimate the initial location/velocity of the 
objects. For the future frames when \Tt\ is large, we can 
replace CS by supp-pred-modCS, i.e., use ReProCS(modCS). 

In Fig. |4(a)l we plot the NMSE of St for ReProCS and 
ReProCS(modCS) based on 100 times Monte Carlo averaging. 
We also plot the reconstruction error of PCP and RSL for 
one realization. ReProCS takes about 2.1 seconds per frame 
and ReProCS(modCS) takes about 2.8 seconds per frame. As 
can be seen, with a large j^^, ReProCS(modCS) outperforms 
ReProCS greatly because it utilizes the correlation model of St 
while ReProCS does not. PCP and RSL again do not work. In 
Fig. |4(b)[ we plot the average number of extras and misses in 



Tt\t-i (predicted support) and Tt\t (updated support) used by 
supp-pred-modCS. Clearly, modCS corrects most prediction 
errors. 

D. ReProCS and ReProCS(modCS) on a real background 
sequence with simulated foreground images 

In this experiment, we compare ReProCS and Re- 
ProCS(modCS) with PCP, RSL, and thresholding based back- 
ground subtraction (BS) on a partly-real video sequence. We 
take a 72 X 90 X 1500 real video sequence around a lake 
with slow global background variations, e.g., water waves. 
By arranging each image frame as a n = 72 x 90 = 6480 
dimensional column vector, we get the background sequence, 
\Li, ■ ■ ■ , ii5oo]> which has nonzero mean. The first <o = 1420 
frames serve as the training data. The nonzero background 
mean, denoted by /ig^ is estimated as the empirical mean of 
the training sequence, i.e., /io = {Li + • • • + Lto)/to. We 
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subtract /zq from Lt and then get the initial PCs estimate, , 
by computing a SVD decomposition of [Li~ ^q, ■ ■ ■ Lt„ — jio]. 
We get r = rank(Ao) = 1255 w 0.2n. For 1 < i < to, we 
let Alt = Lt- For t > to, we overlay a simulated foreground 
image, Ot, on the background image Lt- The observed image, 
Mt, is thus {Mt)T, = (Ot)T, and (MOt,<^ = (Lt)T,<=. Thus, 5^ 
follows (|2]i. The foreground image has one 45 x 25 moving 
block with nonzero pixel intensity 200. The foreground block 
moves slowly from left to right following the motion model 
(fT3T l with initial velocity = 0.5 and velocity change 

variance Q = 0.005 (rit is truncated Gaussian noise with 
nt ~ A/'(0,Q) and < \nt\ < 2^/Q). We show Mt, 

Lt, Ot, and St (shown in 2D fashion) at t = to + 20 in Fig. 



5(c) Compared with the experiments in Sec. IVI-BI and Sec. 
VI-CI the magnitudes of most of the nonzero elements of S'f's 
are now larger but they vary over location and time. In this 
case minjgTt \{St)i \ ~ 25. 

Note that it's have nonzero mean /io and Pt estimates the 
PCs of [Li — ^0, - ■ ■ ,Lt — /io] instead of [Li, • • • , Lt]. We 
do ReProCS and ReProCS(modCS) on Mt - fio to estimate 
St and then let Lt = Mt - St- We do Algorithm |4] to update 
Pt for every 10 frames with D [D Lt ~ ijlq] and a = 0.1. 
For ReProCS, we use 7 = 10 for support thresholding in ( fTOl ). 
For ReProCS(modCS), we run a KF in the same way as in 
Sec. Ivra We use R = Q/50 = 10"'^ in ^ since St's now 
have larger magnitude and so we expect the modCS output 
to be more reliable. We set ajei = O.SminigTt |('5't)i| — 20 
and Q!a(j(j = 0.5a(]ei = 10. We do PCP and RSL using the 
entire sequence, [Mi,-- - , A/1500], and get St- The support 
estimation, Tt, is estimated as the 90%-percent energy set of 
St- The foreground estimation is {Ot)f^ = {Mt)f^, {Ot)fc = 
0. BS first subtracts the background mean, /io, from the current 
image observation, Mt, and then estimate Ot by thresholding 
on the difference image, Mt — /io, using a threshold picked in 
the same way for PCP or RSL. 

As can been seen from Fig. |5] the reconstruction errors 
of PCP RSL, BS and ReProCS are all much larger than 
ReProCS(modCS). The recovered image frames at t = 20 
in shown in Fig. |5(c)| In this experiment, the support size 
of St is about \Tt\ « 0.16n and the number of projected 
measurements is about n — r k, 0.8n, which seems to be 
enough for recovering St using ReProCS. However, as shown 
in the figure, ReProCS cannot recover St very well. This is 
because that there is a lot of background variation hence the 
projection matrix. At = {Pt,±y , cannot make the noise Pt 
sufficiently small compared to St- When the noise Pt is large, 
for a given support size \Tt\, more measurements are needed 
for accurate sparse recovery. We did the same experiment but 
with a smaller foreground block of size 25xl9(|rt| ~ 0.07n). 
In that case, ReProCS and ReProCS(modCS) give similar 
reconstruction errors which are much smaller than PCP, RSL, 
and BS (not shown). 

In Fig. |5(c)[ there are some misses and extras in Ot esti- 
mated by ReProCS(modCS). This is because the magnitudes 
of nonzero elements of St vary over location and time. Also, as 
explained earlier, the noise fit may not be sufficiently small as 
compared with the small elements of 5*4. A possible solution 
is to use location and time varying thresholds. Also, using 



the fact that St is block sparse can significantly help improve 
performance. 

VII. Conclusions, Limitations and Future Work 

This work finds the missing link between the recursive 
robust PCA problem and the recursive sparse recovery problem 
in large noise. From the robust PCA perspective, our pro- 
posed solutions, ReProCS and ReProCS(modCS), are robust 
to correlated "outliers" as long as they are sparse. From the 
sparse recovery perspective, they are robust to large "noise" as 
long as the "noise" is spatially correlated enough to have an 
approximately low rank covariance matrix and this covariance 
matrix is either constant or changes slowly with time. Our sec- 
ond solution, ReProCS(modCS), which utilizes the correlation 
model of the sparse part, can handle significantly less sparse 
5t's or significantly larger ranks of the low rank part. 

A limitation of our approach, and of recursive robust PCA 
in general, is that it requires an initial estimate of the PC 
matrix, Pq. To get that we either need to know that a certain 
initial set of frames have no sparse part, or we need to use 
PCP to recover the low rank part and hence estimate Po- 
Additionally, it also requires that the it's are spread out 
enough (not sparse) and that the low dimensional subspace 
in which Lt lies changes slowly over time. ReProCS(modCS) 
also needs to know the structure of the correlation model on 
the sparse part. 

In this work we study the case of recursively recovering 
a sparse vector from full measurements, i.e. Mt = St + Lt, 
when the corrupting noise has large magnitude, but is highly 
spatially correlated (has a low rank covariance matrix). But 
both ReProCS and ReProCS(modCS) will extend directly to 
the more general case of Mt = '^St + Lt where may be a 
fat or a square matrix. The only change will be that we will 
need to use At — {Pt.i^)''^ instead of just At — {Pt,±y in 
both Algorithm [T] and Algorithm |3] Also, we will need the 
assumption that {Pt does not nullify the sparse vectors 
St- 

This work introduces the idea of support-predicted 
modified-CS using one very simple correlation model. Similar 
ideas can be extended to many more general models and 
to many applications other than video surveillance. Also, in 
certain cases, the support change may be slow enough so 
that the previous support estimate may itself be a very good 
prediction of the current support. In this case the correlation 
model update using a KF is not needed. For the video 
application, certain practical issues are discussed in Sec. lIV-E)] 
Moreover, so far we only use support prediction. We can also 
try to use (a) signal value prediction and (b) incorporate spatial 
correlation, e.g. using [18|. Finally, the arguments to obtain 
the conditions under which stability (time-invariant and small 
bound on the error) of support-predicted-modified-CS holds 
need to be formalized in future work. 

Appendix: Model on the support change of xt 

We provide here a realistic generative model for xt and 
hence the low rank part, Lt = Uxt that satisfies the assump- 
tions given in Sec. II-AI The support set of xt, Nt, is a union 
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(c) First row: original image at 4 = to + 20. Second row: recovered foreground image 
Fig. 5: Here, rank(Pt) ~ 0.2n, and \Tt\ = |supp(5't)| « 0.16?i. 



of three disjoint sets At, 'Dt, and 8t, i.e., Nt ~ AtWDtU £t- 
At t = 1, we let Nt = £t and AtUVt = 0. For t > 1, the 
new addition set At Nt \ iVt-i is the set of indices along 
which {xt)At starts to be nonzero. At is nonempty once every 
d frames. The set Vt C {Nt n Nt-i) is the set of indices 
along which {xt)vt decay exponentially to zero. We assume 
that Vt will not get added to Nt at any future time. The set 
£t := Nt n Nt-i \ Vt is the set of indices along with (xt)^^ 
follows a first order autoregressive (AR-1) model. 

Let S be a diagonal matrix with nonnegative diagonal 
elements afs. We model xt as 



Small 9 ensures that new directions get added at a small 
value and increase slowly. {xt)vt decays as 

{xt)vt = fd{xt)vt 
{xt)et follows an AR-1 model with parameter /: 

{Xt)£t = f{xt-l)£t + {Vt)£t 

b) At time t > jd, the variance of {xt)Ajd gradually 
increases as 



xo =0 



(xtU^N{Q,{l-(l-0)f 



2{t-jd)- 



i e Ajd 



Xt 



FtXt-1 +iyt, i^t A/'(0, Qt) 



(27) 



where i^t is independent and identically distributed Gaussian 
noise with zero mean and diagonal covariance matrix Qt; Ft 
and Qt are two diagonal matrices defined as below 



Eventually, the variance of {xt)Ajd converges to (E)^^^. 
We assume this becomes approximately true much before 
t ^ {j + l)d (the next support change time), 
c) At time t > jd, the variance of {xt)vjd decays exponen- 
tially to zero as 



Ft 



OAt 
(.//)£, 
{fdl)v, 
{0)n, 

e{E)A, 
(i-/^)(sk 

{0)v 











The three scalars /, fii, and 9 satisfy < < / < 1 and 
0<6'<1. 

From the model on xt given in dZTl ). we notice the following: 
a) At time t = jd, {xt)At starts with 

{xt)At ^AA(O,0(E)^J. 



We assume that this has approximately decayed to zero 
much before t = {j + l)d. 

For every d frames, at t = jd, the values of xt along the 
new indices. At, become nonzero and the values of xt along 
existing indices, T>t, start to decay exponentially. The values 
of Xt along all other existing indices follow an independent 
AR-1 model. After a short period which is much less than 
d, the variances of xt along the new indices increase to some 
stable values and the values of xt along the decaying indices 
decayed to zero. Therefore, Nt is piecewise constant from 
t = jd + Ad to t = jd + d. 
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Supplementary Material 

The overall ReProCS algorithm is a recursive approach. In 
ReProCS(modCS) the sparse recovery algorithm, supp-pred- 
modCS, is itself recursive. Hence an important question is 
when and why will it be stable (error bounded by a time- 
invariant and small value)? Our simulations given in Sec. |Vl] 
do indicate that it is stable. In this section, we provide the main 
arguments to show this analytically. These will be formalized 
in later work. Everywhere in the discussion below, "bounded" 
means bounded by a time-invariant value. 

First consider supp-pred-modCS. We give here the main 
idea of an induction argument to show its stability. Suppose 
that alt — 1, \\St-i — 5't_i||2 is bounded and small and that 
the location error, \pt~i— Pt-i\t-i \ is also bounded and small. 
Since nt is assumed to be bounded and small, the second 
induction assumption means that \pt ~Pt\t-i\ is also bounded 
and small. It is easy to see then that the same holds for 
the support prediction errors |Aj|j_i| := \Tt \ Tt\t-i\ and 
|Ae,t|t-i| |ft|t_i\rt|. In fact, both |At|t_i| and \KAt-i\ 
are bounded by \pt —pt\t-i\- Using this and arguments similar 
to those in |[35l|, we should be able to argue that the sparse 
recovery and support update step (modified-CS with Add- 
LS-Del) will also result in (i) Tt\t with bounded and small 



number of extras, |Ag and misses, |Aj|j| and (ii) the error 
of the recovered sparse part, \\St — St\i being bounded and 
small. This step will need to assume that (a) the noise seen 
by modified-CS, /3t, is bounded and small; (b) At satisfies a 
certain RIP condition (if |At|j_i| < a and |Ae_t|t_i| < h then 
we will need (52«:+i+a+6 < (\/2 - 1)) EH, JSl); and (c) 
most nonzero elements of St are large enough. In fact using 
this approach, the bound on the final number of extras, | Ae^t|t | 
will be zero. Thus, we will just need to argue that the final 
misses, Aj|t := Tj \ Ti|j, will result in bounded and small 
centroid observation error, ujt := pt,obs — Pt- We show how to 
do this in the next paragraph. This, along with ensuring the 
stability of will ensure bounded and small \pt — Pt\t\- 
Note that we use a KF in this paper, but in general the above 
argument will go through with any stable linear observer 

Tt\T. To obtain a bound on 
i while pt,obs = Y^ieT 
r U A and so 



Let T := T^u. and A 
Since the final number of extras is zero, Tt 



i-t\t 

\ujt\, notice that pt 



\Tt\ = |T| + |A|.Thus, (|T|-f 
so \T\{pt-pt,obs) = EjeA(« 



and \{i — pt)\ < w for all 

l^^tl = \Pt -Pt,obs| < 



A|)pt-|r|pt^obs 

-Pt). Using (El, 
ieTt. Thus, 

\^t\t\w 



2w + 1 - lAt 



= E^eA i and 
\Tt\=2w + l 



(28) 



In practice, if there are extras (thresholds are not set high 
enough to ensure zero extras), then we can show that 



< lA 



w 



t\t\ 



2u; + l-|A 



+ |A 



Pt\ 



t\t\ 



e,t\t\ 



2w 



(29) 

The above arguments, once formalized, will show that St — 
St is bounded and small. Since Lt^Lt = St~St, the same will 
hold for Lt — Lt. But the above arguments require assuming 
that the "noise" seen by modCS, Pt = j_Lt is bounded and 
small. To show this we will need to show that our recursive 



PCA algorithm given in the Sec |V] is accurate enough, even 
when we use Lt instead of Lt. Also, we will need to assume 
that each element of xt (and hence of Lt) is bounded, e.g., 
follows a truncated Gaussian or uniform distribution. 



